Troubleshooting Postgres

PGConf Europe Berlin
2022-10-25

logo EDB

Who am I

Image by Anemone123 from Pixabay

Troubleshooting

  • Introduction
  • Can't connect
  • Can't start
  • Can't replicate
  • Corruption
Image par Mike Sweeney de Pixabay
logo EDB

Introduction

logo EDB

Pre-requisites

logo EDB

Basics

  • Listen to what people say
  • Everybody lies
  • Read the logs
Image par TheUjulala de Pixabay
logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB logo EDB

Finding logs

diagram to find logs depending on Postgres setting logo EDB logo EDB

Setting logs

log_destination=csvlog
logging_collector=on
log_directory= whatever (not the same disk as $PGDATA)
log_filename= whatever
log_min_duration_statement=250ms (or 1s or 5s)
log_min_messages='WARNING'
log_autovacuum_min_duration=0
log_checkpoints=on
log_connections=on
log_disconnections=on
log_lock_waits=on
log_statement='ddl'
log_temp_files=0
logo EDB

Analyzing logs

  • Not in production
  • On your own computer
  • (or a dedicated server)
Image par Dmitriy de Pixabay
logo EDB

Analyzing logs

create server pglog foreign data wrapper file_fdw;
create foreign table([paste definition found in doc here])
  server pglog
  options ([filename], format 'csv' );
Remember SQL is Turing complete!
Image par Dmitriy de Pixabay
logo EDB

Analyzing logs

  • When you have time
    • Use SQL
  • During emergency
    • Use your human brain (and basic tools)

Get your hands dirty!

  • Can't connect
  • Can't start
  • Can't replicate
  • Corrupted
logo EDB

Containers

  • Dockerfile
  • Build the image
  • Start the container
  • Fix the problem
Image par Ulrike Leone de Pixabay
logo EDB

Containers cheatsheet

Building the image
docker build -t imgname .
Create the container
docker run -d -P --name name imgname
List the containers
docker ps
Start/Stop the container
docker stop containerid
docker start containerid
Remove the container
docker rm [-f] containerid
Connecting to the container
docker exec -it name /bin/whatever
Image par Andy Barbour de Pexels
logo EDB

Can't connect

Someone you don't know tells you:

I can't connect to Postgres. It does not work. Postgres sucks I shouldn't have chosen that database. It's too hard to use.

What do you do?

logo EDB

Checklist

  1. Rephrase the problem to make sure you have understood correctly
  2. Ask questions
  3. Double check every claim
  4. Ask to take control
logo EDB

What the user is trying to do

laetitia:~/01-CantConnect|⇒  ./spin_up_postgres.sh

[...]

laetitia:~/01-CantConnect|⇒  psql -h localhost
psql: error: connection to server at "localhost" (::1),
             port 5432 failed: Connection refused
        Is the server running on that host and accepting
        TCP/IP connections?
             connection to server at "localhost" (127.0.0.1),
             port 5432 failed: Connection refused
        Is the server running on that host and accepting
        TCP/IP connections?
Now, go fix this!
Image par Gerhard Litz de Pixabay
logo EDB

Can we connect locally?

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect \
 /bin/psql
psql (15.0 (Ubuntu 15.0-1.pgdg20.04+1))
Type "help" for help.

postgres=#
Yes
Image par Gerhard Litz de Pixabay
logo EDB

Can we connect locally using the same credentials?

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect \
 /bin/psql -U laetitia laetitia

psql: error: connection to server on socket
 "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: 
 Peer authentication failed for user "laetitia"
No
Image par Gerhard Litz de Pixabay
logo EDB

pg_hba.conf

  • Controls client authentication
  • Works as a Linux firewall
Image par Gerhard Litz de Pixabay
logo EDB

pg_hba.conf

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect
 /bin/sh

$ cat /etc/postgresql/15/main/pg_hba.conf
# Database administrative login by Unix domain socket
local   all   postgres                peer

# TYPE  DB    USER      ADDRESS       METHOD

# "local" is for Unix domain socket connections only
local   all  all                      peer
# IPv4 local connections:
host    all  all        127.0.0.1/32  scram-sha-256
# IPv6 local connections:
host    all  all        ::1/128       scram-sha-256
Image par Gerhard Litz de Pixabay
logo EDB

Can we connect with the same credentials and same auth method locally?

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect \
 /bin/psql -h localhost -U laetitia laetitia

Password for user laetitia:
psql (15.0 (Ubuntu 15.0-1.pgdg20.04+1))
SSL connection (protocol: TLSv1.3,
 cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

laetitia=#
Yes
Image par Gerhard Litz de Pixabay
logo EDB

Let's try again from the laptop

laetitia:~|⇒ psql -h localhost -U laetitia laetitia

psql: error: connection to server at "localhost" (::1),
 port 5432 failed: Connection refused
 Is the server running on that host and accepting
 TCP/IP connections?
connection to server at "localhost" (127.0.0.1),i
 port 5432 failed: Connection refused
 Is the server running on that host and accepting
 TCP/IP connections
No
Image par Gerhard Litz de Pixabay
logo EDB

Is my port open?

(echo >/dev/tcp/127.0.0.1/5432) &>/dev/null \
&& echo "open" || echo "close"

close
No
Image par Gerhard Litz de Pixabay
logo EDB

It's a container!

laetitia:~/01-CantConnect|⇒  grep -i expose Dockerfile 

# Expose the PostgreSQL port
EXPOSE 5432
laetitia:~/01-CantConnect|⇒  cat spin_up_postgres.sh 
docker build -t cantconnect .

docker run -d -P --name 01-cantconnect cantconnect
Seems correct
Image par Gerhard Litz de Pixabay
logo EDB

Is it the correct port?

laetitia:~/01-CantConnect|⇒ docker ps --format \
"table {{.ID}}\t{{.Image}}\t{{.Ports}}" 

CONTAINER ID   IMAGE         PORTS
9dcad71213a0   cantconnect   0.0.0.0:55009->5432/tcp
The port is 55009!
Image par Gerhard Litz de Pixabay
logo EDB

Is it the correct port?

laetitia:~/01-CantConnect|⇒ psql -h localhost -p 55009\
 -U laetitia laetitia

psql: error: connection to server at "localhost" (::1),\
 port 55009 failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
We're getting closer!
Image par Gerhard Litz de Pixabay
logo EDB

Wait a minute... Listen addresses!!!

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect \
/bin/sh

$ grep listen_ad /etc/postgresql/15/main/postgresql.conf
#listen_addresses = 'localhost'
We need to change that
Image par Gerhard Litz de Pixabay
logo EDB

Listen_addresses

  • Specifies the TCP/IP address(es) on which the server is to listen for connections
  • Change it to '*'
  • Needs a restart
docker stop dockerId
docker start dockerId
Image par Gerhard Litz de Pixabay
logo EDB

Trying again

laetitia:~/01-CantConnect|⇒  docker ps --format \
"table {{.ID}}\t{{.Image}}\t{{.Ports}}"

CONTAINER ID   IMAGE         PORTS
b44bcd52f08c   cantconnect   0.0.0.0:55016->5432/tcp

laetitia:~/01-CantConnect|⇒  psql -h 127.0.0.1 -p 55016
psql: error: connection to server at "127.0.0.1",
 port 55016 failed:
 FATAL:  no pg_hba.conf entry for host "172.17.0.1",
 user "laetitia", database "laetitia", no encryption
We're getting closer!
Image par Gerhard Litz de Pixabay
logo EDB

pg_hba.conf

laetitia:~/01-CantConnect|⇒  docker exec -it 01-cantconnect \
 /bin/sh
$ vi /etc/postgresql/15/main/pg_hba.conf
$ cat /etc/postgresql/15/main/pg_hba.conf

# Database administrative login by Unix domain socket
local   all   postgres                 peer

# TYPE  DB    USER      ADDRESS        METHOD

# "local" is for Unix domain socket connections only
local   all  all                       peer
# IPv4 local connections:
host    all  all        127.0.0.1/32   scram-sha-256
host    all  all        172.17.0.1/32  scram-sha-256
Image par Gerhard Litz de Pixabay
logo EDB

pg_hba.conf

Don't forget to reload the configuration
$ psql
psql (15.0 (Ubuntu 15.0-1.pgdg20.04+1))
Type "help" for help.

postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
Image par Gerhard Litz de Pixabay
logo EDB

Let's try again!

laetitia:~/01-CantConnect|⇒  psql -h 127.0.0.1 -p 55016 \
-U laetitia
Password for user laetitia: 
psql (16devel, server 15.0 (Ubuntu 15.0-1.pgdg20.04+1))
Type "help" for help.

laetitia=#
Problem solved!
Image par Gerhard Litz de Pixabay
logo EDB

Can't start

One of your collegue comes into your office and say

I don't know what happened. We stopped Postgres for maintenance reasons and now it refuses to start. I don't know what's wrong. We didn't change anything!

What do you do?

Image par Adri Marie de Pixabay
logo EDB

What the user is trying to do

laetitia:~/02-CantStart|⇒ docker exec -it 02-cantstart /bin/sh

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
  -c config_file=/etc/postgresql/15/main/postgresql.conf

LOG:  skipping missing configuration file "/var/lib/postgresql/15/main/postgresql.auto.conf"
LOG:  invalid value for parameter "wal_level": "maximal"
HINT:  Available values: minimal, replica, logical.
FATAL:  configuration file
 "/etc/postgresql/15/main/postgresql.conf" contains errors
They didn't change anything?
Image par Adri Marie de Pixabay
logo EDB

Changing the configuration file

  • Open the configuration file
  • Find wal_level
  • Fix it vi /etc/postgresql/15/main/postgresql.conf


That one was easy
Image par Adri Marie de Pixabay
logo EDB

Let's try again

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
  -c config_file=/etc/postgresql/15/main/postgresql.conf

LOG:  skipping missing configuration file "/var/lib/postgresql/15/main/postgresql.auto.conf"
FATAL:  data directory "/var/lib/postgresql/15/main" has wrong ownership
HINT:  The server must be started by the user that owns the data directory. 
What?
Image par Adri Marie de Pixabay
logo EDB

Wrong permissions

$ ls -l /var/lib/postgresql/15/main
ls: cannot access '/var/lib/postgresql/15/main/pg_xact': Permission denied
ls: cannot access '/var/lib/postgresql/15/main/pg_wal': Permission denied
ls: cannot access '/var/lib/postgresql/15/main/pg_subtrans': Permission denied
ls: cannot access '/var/lib/postgresql/15/main/pg_commit_ts': Permission denied
 [...]

$ whoami
postgres
We need super user permissions
Image par Adri Marie de Pixabay
logo EDB

How to be root?

  • sudo won't help
  • change your connection command
  • docker exec -it -u 0 name cmd
Image par Adri Marie de Pixabay
logo EDB

Being root

laetitia:~/02-CantStart|⇒ docker exec -it -u 0 02-cantstart /bin/sh

# ls -l /var/lib/postgresql/15
total 4
drw-rw-rw- 19 root root 4096 Oct 23 19:43 main

# ls -l /var/lib/postgresql/15/main
total 84
-rw-rw-rw- 1 root root    3 Oct 22 20:22 PG_VERSION
drw-rw-rw- 6 root root 4096 Oct 23 19:43 base
drw-rw-rw- 2 root root 4096 Oct 23 19:43 global
drw-rw-rw- 2 root root 4096 Oct 22 20:22 pg_commit_ts

# chown -R postgres:postgres /var/lib/postgresql/15/main
Fixed!
Image par Adri Marie de Pixabay
logo EDB

Let's try again

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
  -c config_file=/etc/postgresql/15/main/postgresql.conf

LOG:  skipping missing configuration file "/var/lib/postgresql/15/main/postgresql.auto.conf"
FATAL:  data directory "/var/lib/postgresql/15/main" has invalid
 permissions
DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
Seriously?
Image par Adri Marie de Pixabay
logo EDB

Being root

laetitia:~/02-CantStart|⇒ docker exec -it -u 0 02-cantstart /bin/sh

# ls -l /var/lib/postgresql/15
total 4
drw-rw-rw- 19 postgres postgres 4096 Oct 23 19:43 main

# ls -l /var/lib/postgresql/15/main
total 84
-rw-rw-rw- 1 postgres postgres    3 Oct 22 20:22 PG_VERSION
drw-rw-rw- 6 postgres postgres 4096 Oct 23 19:43 base
drw-rw-rw- 2 postgres postgres 4096 Oct 23 19:43 global
drw-rw-rw- 2 postgres postgres 4096 Oct 22 20:22 pg_commit_ts

# chmod -R 0700 /var/lib/postgresql/15/main
Fixed!
Image par Adri Marie de Pixabay
logo EDB

Let's try again

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
  -c config_file=/etc/postgresql/15/main/postgresql.conf

LOG:  starting PostgreSQL 15.0 (Ubuntu 15.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
LOG:  database system was interrupted; last known up at 2022-10-23 20:46:37 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/1915CC8: wanted 24, got 0
LOG:  redo is not required
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.278 s, sync=0.002 s, total=0.285 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB
LOG:  database system is ready to accept connections
Problem solved
Image par Adri Marie de Pixabay
logo EDB

Bonus

The person comes back later saying there is something weird with Postgres: even though they changed Postgres setting shared_buffers to 128MB, it seems Postgres is taken more memory than that.

What do you do?
Image par Adri Marie de Pixabay
logo EDB

Postgres settings

postgres=# select name, setting, unit, source  
postgres-# from pg_settings
postgres-# where name ~ 'shared_buffers';
name      | setting | unit |       source       
----------------+---------+------+--------------------
 shared_buffers | 4194304 | 8kB  | configuration file
 (1 row)

postgres=# select 4194304*8/1024/1024 || 'GB' as shared_buffers;
 shared_buffers 
----------------
 32GB
(1 row)

postgres=# \! grep -E "^shared_buffers" /etc/postgresql/15/main/postgresql.conf
shared_buffers = 128MB                  # min 128kB
Image par Adri Marie de Pixabay
logo EDB

Postgres settings

postgres=# select name, source, sourcefile
from pg_settings
where name ~ 'shared_buffers';
      name      |       source       |                    sourcefile                    
----------------+--------------------+--------------------------------------------------
 shared_buffers | configuration file | /var/lib/postgresql/15/main/postgresql.auto.conf
(1 row)

postgres=# \! cat /var/lib/postgresql/15/main/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
shared_buffers = '32GB'

postgres=# alter system reset shared_buffers;
ALTER SYSTEM
Don't forget to restart
Image par Adri Marie de Pixabay
logo EDB

It can't replicate

Someone you don't know tells you:

I don't know what happened. My replication stopped working. I swear I didn't change anything. My user to replicate is called replicator and its password is r3pl1c4t0r.

What do you do?

logo EDB

What the user is trying to do

laetitia:~/03-CantReplicate|⇒ ./spin_up_postgres.sh 

[...]

laetitia:~/03-Corrupted|⇒ docker exec -it 03-cantreplicateprincess /bin/sh

$ /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main \
-c config_file=/etc/postgresql/15/main/postgresql.conf &
LOG:  consistent recovery state reached at 0/2000100
LOG:  database system is ready to accept read-only connections
postgresql/.pgpass" has group or world access;
 permissions should be u=rw (0600) or less
FATAL:  could not connect to the primary server: connection to server
 at "172.17.0.3", port 5432 failed: fe_sendauth: no password supplied
WARNING: password file "/var/lib/postgresql/.pgpass" has group or world
 access; permissions should be u=rw (0600) or less 
Go fix this! logo EDB

Password file permissions

laetitia:~/03-CantReplicate|⇒ docker exec -it 03-cantreplicateprincess \
  /bin/sh

$ ls -l ~/.pgpass
-rw-rw-rw- 1 postgres postgres 31 Oct 24 22:29 /var/lib/postgresql/.pgpass

$ chmod 600 ~/.pgpass
Fixed! logo EDB

Is it really fixed?

laetitia:~/03-CantReplicate|⇒ docker exec -it 03-cantreplicateprincess \
  /bin/sh

$ /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main \
> -c config_file=/etc/postgresql/15/main/postgresql.conf &
$ 2022-10-24 23:14:59.484 UTC [212] LOG:  starting PostgreSQL 15.0 (Ubuntu 15.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

[...]

LOG:  database system is ready to accept read-only connections
LOG:  invalid record length at 0/3000148: wanted 24, got 0
LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
Go fix this! logo EDB

You need a backup

laetitia:~/03-CantReplicate|⇒ docker exec -it 03-cantreplicatequeen \
/usr/sbin/ifconfig eth0 | awk '/inet/ { $2; print $2}'
172.17.0.3

laetitia:~/03-CantReplicate|⇒  docker exec -it 03-cantreplicateprincess \
  /bin/sh

$ rm -rf /var/lib/postgresql/15/main/*
$ pg_basebackup -w -U replicator -X stream -R -c fast \
-h 172.17.0.3 -D /var/lib/postgresql/15/main
Fixed! logo EDB

Checking

laetitia:~/03-CantReplicate|⇒  docker exec -it 03-cantreplicateprincess /bin/sh

$ /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main \
 -c config_file=/etc/postgresql/15/main/postgresql.conf &

laetitia:~/03-CantReplicate|⇒  docker exec -it 03-cantreplicatequeen psql -c "SELECT * FROM pg_stat_replication" -x
-[ RECORD 1 ]----+------------------------------
pid              | 272
usesysid         | 24576
usename          | replicator
application_name | 15/main
client_addr      | 172.17.0.4
client_hostname  | 
client_port      | 49938
backend_start    | 2022-10-24 23:26:57.384432+00
reply_time       | 2022-10-24 23:27:17.650712+00 
Fixed! logo EDB

It's corrupted

Someone you don't know tells you:

Postgres won't start. I don't know why. Please save my life and solve it! By the way, I should have a backup under /var/lib/postgresql/backups/main.

What do you do?

logo EDB

What the user is trying to do

laetitia:~/04-Corrupted|⇒ ./spin_up_postgres.sh 

[...]

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
    -c config_file=/etc/postgresql/15/main/postgresql.conf &
 PANIC:  could not read file "global/pg_control": read 0 of 296
 
Not good!
Image par Adri Marie de Pixabay
logo EDB

Fixing the problem

laetitia:~/04-Corrupted|⇒ ./spin_up_postgres.sh

[...]

$ cp -R /var/lib/postgresql/backups/main/* /var/lib/postgresql/15/main/

$ /usr/lib/postgresql/15/bin/postgres \
  -D /var/lib/postgresql/15/main \
    -c config_file=/etc/postgresql/15/main/postgresql.conf &
Image par Adri Marie de Pixabay
logo EDB

Preventing silence corruption

  • Activating checksum
  • CHeck the checksums while backuping
  • Dump the database
  • Monitor the logfile
Image par Adri Marie de Pixabay
logo EDB

The end

Image par 00luvicecream de Pixabay
logo EDB